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Abstract 

The goal of this report was to test the use of sensor-based skill measures in evaluating 
performance differences in rifle marksmanship. Ten shots were collected from 30 novices 
and 9 experts. Three measures for breath control and one for trigger control were used to 
predict skill classification. The data were fitted with a logistic regression model using 
holdout validation to assess the quality of model classifications. Individually, all four 
measures were significant; when considered together, only three measures were 
significant predictors for level of expertise {p < .05). Overall percent correct in shot 
classification for the testing data was 90.0%, with a sensitivity of 67.5%, and 96.0% 
specificity. 



Introduction 

Rifle marksmanship is an inherently eomplex task. Shooters must position various body 
parts to aehieve maximum rifle support and at the same time establish and maintain proper 
sight alignment and eorreet sight pieture, all prior to initiating the eoordinated steps 
neeessary in the exeeution of a shot (Chung, Delaeruz, de Vries, Bewley, & Baker, 2006). 

Skilled shooters have been found to be able to hold a rifle steadier than unskilled 
shooters (MeGuigan & MaeCaslin, 1955). Similarly, researeh on pistol shooting found that, 
while both noviees and experts shared a single dominant pattern of movement, experts tended 
to hold their bodies in similar positions, favoring those that minimized the effeets of 
movement on the target (Penn State, 2001). It is elear that skilled shooters minimize body 
movements by proper positioning. 

In studies dealing with marksmanship, sueh as eorrelation studies between simulators 
and live fire (Hagman, 1998; Sehendel, Heller, Finley, & Hawley, 1985; Smith & Hagman, 

*We would like to thank the staff at Camp Pendleton WTBN. We would also like to thank the following people 
from UCLA/CRESST: Joanne Michiuye for her help with the preparation of this manuscript and with data 
collection, and Daniel Parks for hardware design and development. 
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2000), and studies on the impaet of nutrition on performanee (Tharion & Moore, 1993) and 
the role of anxiety for noviees (Chung, O’Neil, Delaeruz, & Bewley, 2005), assessment of 
marksmanship performanee has relied on shot plaeement (e.g., seore, aeeuraey, tightness of 
shots) to make judgments about a shooter’s skill level. Although appropriate as a broad 
measure of relative ability, evaluation solely on the basis of shot plaeement earries with it the 
potential to eoneeal underlying differenees in shooter skill. In the eontext of training, the use 
of sueh an outeome measure as a metrie ean be detrimental, making identifieation and 
subsequent remediation of problematie aspeets of performanee diffieult, if not impossible 
(AERA, APA, NCME, 1999; Wiggins, 1998). 

Good metries must be objeetive, intuitive, at the level of detail appropriate for deeision- 
making, aeeeptable, preeise, generalizable, sensitive, reliable, and most important, valid 
(ANSI, 1993). An important eonsideration in establishing a metrie is that an individual’s 
performanee needs to be refereneed to a eriterion, for example an expert. Expert performanee 
is eonsidered the referent or gold standard against whieh to eompare trainee performanee 
(Chi, Glaser, & Farr, 1988). 

A potential reason for the absenee of a valid objeetive measure in evaluating 
marksmanship skill performanee is the subtle nature of the aetions involved. While position 
quality, eoarse movement of the muzzle, and the trigger break are easily observed by the 
evaluator, the steps leading up to the trigger break (e.g., aiming, trigger squeeze, eontrol of 
respiration) are less pereeptible, making direet visual observation and proper diagnosis 
diffieult. 

One approaeh to objeetively eapture and measure subtle human movements is with the 
use of sensors. Sensors have the potential to serve as a reliable and unobtrusive surrogate in 
situations where human observations are impraetieal (De Ketelaere, Bamelis, Kemps, 
Deeuypere & De Baerdemaeker, 2004; Wide, Winquist, Bergsten, & Petriu, 1998). As a 
methodology for evaluating human performanee, sensors have already been shown to be 
effeetive in the medieal field differentiating levels of experienee of arthroseopie surgeons 
(Chami, Ward, Phillips, & Sherman, 2008) and laparoseopie surgeons (Rosen, Solazzo, 
Hannaford, & Sinanan, 2001). 

The goal of this study was to test whether sensor-based measures designed to assess 
key aspeets of marksmanship skill are sensitive enough to differentiate between levels of 
marksmanship skill performanee. Sensors were developed, eoneentrating on two areas of 
marksmanship believed to impaet performanee: breath eontrol and trigger eontrol. Eaeh shot 
was then evaluated, using expert eriteria, to judge the quality of skill performanee. 
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Methods 



Participants 

Shots were eolleeted from 39 partieipants, 30 noviees and 9 experts. Noviees ranged in 
age from 19 to 29 years (M= 22.20, SD = 2.57). Of the 30 noviees, 23 (77%) were male, and 
7 (23%) were female. Twelve (40%) reported having prior experienee shooting a rifle. Of 
those reporting prior experienee, 3 (25%) reported having shot a rifle within the last year, 3 
(25%) within 3 to 5 years, and 6 (50%) reported firing a rifle over 5 years ago. None of the 
noviees reported experienee with eompetitive shooting and 2 (7%) reported having eoaehed 
rifle shooting. 

All nine experts seleeted for study were aetive-duty members of the armed forees with 
a primary military oeeupation speeialty (MOS) as marksmanship eoaehes. All were male and 
ranged in age from 21 to 25 years (M= 23.33, SD = 1.41). Coaehing experienee ranged from 
1 to 24 months (M= 12.44, SD = 7.52). In addition to being rifle marksmanship eoaehes, five 
(56%) were also qualified as rifle marksmanship instruetors. 

While several subjeets in the noviee sample had some familiarity with marksmanship, 
none had training eonsistent with marksmanship instruetion as delivered in the armed forees. 
Aeeordingly, all subjeets were regarded as noviees. 

Design 

Holdout validation was used to assess the quality of shot elassifieations based on 
estimated model parameters (Kerlinger & Pedhazur, 1973). Partieipants were randomly 
assigned to two groups, model training and model testing. Cases in model training were used 
to estimate model parameters, while observations in model testing are held baek from the 
estimation proeedure and later fitted to the data. Sample distribution of subjeets aeross data 
files is presented in Table 1. 

Table 1 



Distribution of Subjects in Model Training and Model Testing Data 





Data 






Status 


Training 


Testing 


Total 


Novice 


15 


15 


30 


Expert 


5 


4 


9 


Summary 


20 


19 


39 



Note. Ten shots were collected from each subject. 
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Apparatus 

Data were eolleeted in an indoor eontrolled environment. An instrumented weapon was 
developed using off-the-shelf sensing eomponents and a demilitarized M16/A2 housing a 
pneumatie reeoil system designed to approximate the weight, noise, and aetion of a real 
weapon firing real rounds (LaserShot, 2008). Four performance skill measures were collected 
using two sensors, a force-pressure sensor attached to the trigger to measure the amount of 
pressure exerted on the trigger during firing, and a respiration belt used to measure 
participants’ respiration. Both sensors were wired to a microprocessor and data were 
wirelessly downloaded onto a remote laptop. Shots were directed against a projection of a 
circular target equivalent to 20 inches wide at 200 yards. A camera identified shot placement 
on target by recognizing infrared laser strikes delivered by the rifle. For more detailed 
information on the development of the sensor-based measures and targeting system, see 
Espinosa, Nagashima, Chung, Parks, and Baker (2009, CRESST Tech. Rep. No. 756). 

Procedure 

All novices were provided basic instruction on shooting position, weapons handling, 
and proper sight alignment. Initial instructions were delivered to all novices by the same 
researcher. Although participants were instructed to shoot in the kneeling position, they were 
given the option of choosing between low, medium, or high kneeling. Variations in the 
kneeling position were modeled by the instructor; in addition, illustrations depicting left- and 
right-handed variations on the kneeling position were provided. Ten shots were collected and 
analyzed from each subject across two trials. No time constraints were imposed on the 
shooters and they were not provided feedback regarding shot placement until the end of each 
trial. 

Measures 

Eour performance skill measures were evaluated for each shot, three related to breath 
control {breath location, breath duration, and shot-percent breath) and one for trigger 
control {trigger duration). 

Breath location represents the location in the respiratory cycle at trigger break. Values 
can range from 0 to 100, with 0 indicating that the shot was taken while fully exhaled, and 
100 indicating that the shooter was fully inhaled. Doctrine dictates shots be taken during a 
natural respiratory pause, therefore a value near zero is desirable. 

Breath duration is a measure of the time, in seconds, between full inhales flanking the 
shot. Earger values indicate longer periods of time between breaths (slower rate of 
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respiration); conversely, smaller values indicate a shorter period of time between breaths 
(faster rate of respiration). 

Shot-percent breath is used to approximate the location, in percent, of where the trigger 
break occurred relative to the full inhales spanning the trigger break. For example, .50 
indicates that a shot was fired equidistant from two full i nh ales. 

Trigger duration was the only measure of trigger control and represents the amount of 
time, in seconds, pressure is exerted on the trigger prior to a shot being fired. Larger values 
indicate a greater amount of time taken to pull the trigger. 

Analysis 

A logistic regression model was developed to test the extent to which shots can be 
classified as originating from a novice or expert using the skill measures as predictors. An 
extension of simple logistic regression is used to account for multiple predictors as follows: 

logit(Y) = ln(7i/(l - 7t)) = a + PiXi + P 2 X 2 + . . . + PpXp 

n = Probability (Y = outcome of interest | Xi = xi, X 2 = X 2 , . . ., Xp = Xp) 

= [(e +- + PP^P)/ (1 + e “^^^l^'^2^2 +- + PP^P)] 

where n is the probability of the classification, a is the Y intercept, Ps are regression 
coefficients, and Xs are the set of predictor variables. The value of the coefficient p 
determines the direction of the relationship between X and the logit of Y. When p is greater 
than zero, larger X values are associated with larger logits of Y. Conversely, if P is less than 
zero, larger X values are associated with smaller logits of Y. as and Ps are estimated using 
the maximum likelihood (ML) procedure designed to maximize the likelihood of reproducing 
the data given the parameter estimates. 

The outcome variable (Y) is expertise status {status) and was used to designate cases as 
either expert or novice (1 = expert, 0 = novice). The logistic procedure predicts the “1” 
category of the dependent variable, making the “0” category the reference category. The skill 
measures were used as four continuous predictor variables — breath location, breath 
duration, trigger duration, and shot-breath location. The logistic regression analysis was 
carried out using the binary logistic regression command in Statistical Package for the Social 
Sciences (SPSS,® 1999) version 16 in Windows 2000 environment. 
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The statistical significance of individual regression coefficients (i.e., |3s) was tested 
using the Wald chi-square statistic, and the Hosmer-Lemeshow (H-L) test was used to assess 
the goodness-of-fit for the final logistic model. 

Several indices describing the predictive performance were calculated to assess 
predicted model classifications — sensitivity (true positive fraction), specificity (true negative 
fraction), false positive, false negative, and the c statistic. 

Sensitivity is the proportion of correctly classified experts and specificity represents the 
proportion of correctly classified novices. False positive is the proportion of cases 
misclassified as experts, while false negative is the proportion of cases misclassified as 
novices. 

The c statistic is a measure of discrimination, ranging from 0.5 to 1. A value of 0.5 
indicates that the model is no better than assigning observations randomly into outcome 
categories; A value of 1 indicates that the model assigns higher probabilities to all 
observations with the event outcome, compared with nonevent observations. 

Results 

Descriptive Statistics 

Mean and standard deviations for the skill variables are provided in Table 2. For all 
shots, breath location ranges from 0.00 to 91.80 (M= 35.34, SD = 23.86), breath duration 
ranges from 0.31 to 13.16 seconds (M= 3.45, SD = 2.35), shot-percent breath ranges from 
0.01 to 1.00 (M= .54, SD = .25), and trigger duration ranges from 0.00 to 95.27 seconds 
(M= 4.32, SD = 8.06). 

Mean breath location was 42.1 {SD = 22.9) for novices, and 13.0 {SD = 8.7) for 
experts. When a shot was fired, novices, on average, were partially inhaled, while experts 
were nearly fully exhaled at trigger break. The mean breath duration for novices was only 
2.5 seconds {SD = 1.1), and 6.5 seconds {SD = 2.8) for experts. These values indicate that the 
average respiratory cycle for novices around the trigger break lasts 2.5 seconds, whereas for 
experts, the respiratory cycle lasts 6.6 seconds. The mean shot-percent breath for novice was 
.52 {SD= .27) and .64 {SD = .17) for experts. The novice group mean of .52, or 52%, 
indicates that the average shot was fired midway between two full inhales, while the expert 
group has a mean of .64, or 64%, which indicates that experts take shots closer toward the 
end of a respiratory cycle. For the measure of trigger control, the mean trigger duration for 
novices was 5.2 sec {SD = 8.9), and 1.4 sec {SD = 2.3) for the expert group. Novices appear 
to take longer pulling the trigger, 5.2 seconds, compared to experts, 1.4 seconds. 
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Table 2 

Mean and Standard Deviation for Skill Measures 



Variables 


Novice 

M(SD) 


Status 

Expert 

M(SD) 


All subjects 
M(SD) 


Breath control 


Breath location 


42.05 (22.85) 


12.97 (8.71) 


35.34 (23.86) 


Breath duration 


2.54(1.10) 


6.46 (2.85) 


3.45 (2.35) 


Shot-percent breath 


0.52 (0.27) 


0.64 (0.17) 


0.54 (0.25) 


Trigger control 


Trigger duration 


5.20 (8.93) 


1.42 (2.29) 


4.32 (8.06) 



Note. n = 300 for novice group and n = 90 for expert cases. 



Pearson correlations among the skill variables are reported in Table 3. The correlation 
of breath location and breath duration was significant, r (388) = -.488, p < .001, as was 
breath location and shot-percent breath at r (388) = -.216, p < .001, and breath duration and 
shot-percent breath, r (388) = .280, < .001. Trigger duration did not correlate significantly 
with the other variables. 



Table 3 

Correlations for Measures of Skill Performance 



Variables 


1 


2 


3 4 


1 . Breath location 


- 






2. Breath duration 


-.488** 


- 




3 . Shot-percent breath 


-.216** 


.280** 


- 


4. Trigger duration 


.096 


-0.049 


0.005 



Note. N=390. 

**p < 0.01 (two-tailed). 



Given the significant correlation values between the variables for breath control, 
tolerance values were calculated to assess the threat of collinearity. Tolerance values range 
from .730 for breath duration to .990 for trigger duration (Table 4). Based on the critical 
value tolerance < .2, the potential threat of collinearity is negligible (Menard, 1995). 
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Table 4 

Collinearity Statistics for Independent Variables 



Variable 


Tolerance 


Breath control 


Breath location 


.750 


Breath duration 


.730 


Shot-percent breath 


.913 


Trigger control 


Trigger duration 


.990 



Note. Tolerance is equivalent to 1 /variance inflation factor. 



Logistic Regression Model 

We first estimated individual univariate logistic regression models for the variables in 
the training data to test the research hypothesis regarding the relationship between the 
likelihood of classification as expert based on the individual measures of skill performance. 
Again, the outcome variable, status, was used to designate the shot classification as expert 
marksman (1 = yes, 0 = no). The four continuous predictor variables include the three 
variables for breath control (breath location, breath duration, shot-percent breath) and one 
measure for trigger control {trigger duration). Table 5 presents the results from the analysis 
for the univariate relationship between the skill measures and predicted marksmanship status. 

Table 5 



Summary of Univariate Logistic Regression Results for Marksmanship Skill Variables Using Training Data 
(«= 190) 



Variables 


B 


SE 


Wald statistic 


OR 


95% Cl 


Breath control 


Breath location 


-0.137 


0.022 


37.559 


0.872** 


.834 -.911 


Breath duration 


2.039 


0.335 


37.060 


7.686** 


3.986 - 14.820 


Shot-percent breath 


2.892 


0.758 


14.572 


18.029** 


4.084 - 79.584 


Trigger control 


Trigger duration 


-0.237 


0.075 


9.902 


0.789* 


.681 - .915 



Note. OR = odds ratio, Cl = confidence interval. 

/? < .01. p< .001. 



Considered individually, all four variables are significant {p < .01) predictors relative to 
the null model. Next, we estimated a multiple logistic regression model to investigate the 




simultaneous effects of all four skill measures on status. Given the significance of the four 
predictor variables in the univariate model, a four-predictor multiple logistic model was fitted 
to the data. Table 6 presents the results of multiple regression analysis. 



Table 6 

Summary of Multiple Logistic Regression Results for Marksman Skill Variables Using Training Data (k = 200) 



Variables 


B 


SE 


Wald statistic 


OR 


95% Cl 


Breath control 


Breath location 


-0.148 


0.052 


7.946 


0.862** 


.778 - .956 


Breath duration 


2.111 


0.491 


18.502 


8.256*** 


3.155 -21.604 


Shot-percent breath 


2.241 


3.752 


0.357 


9.398 


.006 - 14691 


Trigger control 


Trigger duration 


-0.540 


0.200 


7.282 


0.583** 


.393 - .863 


Constant 


-6.381 


2.604 


6.003 


0.002* 





Note. OR = odds ratio, Cl = confidence interval. 
*p < .05. *V< -01. ***/>< .0001. 



When all four predictors are considered jointly, the overall model significantly 
differentiates between expert and novice skill performance relative to the null model, 
=188.18, df = 4, p < .001. The variables breath location, breath duration, and trigger 
duration are significant {p < .05). The variable shot-percent breath, while a significant 
predictor when used alone, was not a significant predictor when used concurrently with all 
four variables. The test of the intercept (i.e., constant in Table 6) suggests the intercept 
should be included in the model. 

The log odds of expert classification is as follows: 

log (tt/I-ti) = -6.381 - 0.148*(breath location) + 2.111*(breath duration) + 2.241*(shot- 

percent breath) - 0.540*(trigger duration) 

When interpreting the logistic regression results, an odds ratio greater than 1.0 implies 
a positive association between the skill measure and status, while an odds ratio less than 1.0 
implies a negative association. Odds ratios close to 1.0 indicate that unit changes in that skill 
variable do not affect the odds of predicted status. The variable breath location with an odds 
ratio of .862 indicates that as breath location increases, the odds of expert skill diminish. 
Specifically, the odds of expert classification diminished by a factor of .137, for one unit 
increase in location, controlling for other variables in the model. Additionally, for breath 
duration, a one-second increase in breath duration results in an 8.26 times greater chance of 
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expert classification. Lastly, the odds ratio of .583 for trigger duration signifies that for every 
one-second increase in trigger duration, the odds of expert classification decreases by a 
factor of .417. The variable shot-percent breath was not a significant predictor. 

Overall Model Evaluation 

The Hosmer-Lemeshow test of inferential goodness-of-fit yielded a %^(8) of .196 and is 
non-significant {p > .05), suggesting that the model exhibits a considerable degree of fit to 
the data. In other words, the null hypothesis of a good model fit to data was tenable. The 
logistic model resulted in a c statistic of .973, indicating that for 97.3% of all possible pairs 
of shots — one expert and the other novice — the model correctly assigned a higher probability 
to those who were expert. 

For the model training data, 147 of 150 novice shots and 46 of 50 expert shots were 
accurately classified. Accordingly, the sensitivity, the ability to identify expert shots, was 
92%, and the specificity, the power to identify novice shots, was 98% for the training data. 
For the model testing data, 144 of 150 novice shots and 13 of 40 expert shots were accurately 
classified, resulting in a sensitivity of 96% and specificity of 67.5%. A 2 x 2 classification 
table showing observed versus predicted classifications, based on a cutoff value of .50, or 
50%, can be found in Table 7 for the training data, and Table 8 for the testing data. 

Table 7 



Summary of Predicted Classification for Model Training Data 





Predicted 




Observed 


Novice 


Expert 


% Correct 


Novice 


147 


3 


98.0 


Expert 


4 


46 


92.0 


Overall % correct 






96.5 



Note. Cut value set at .50. 
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Table 8 

Summary of Predicted Classification for Model Testing Data 





Predicted 




Observed 


Novice 


Expert 


% Correct 


Novice 


144 


6 


96.0 


Expert 


13 


27 


67.5 


Overall % correct 






90.0 



Note. Cut value set at .50. 



Table 9 

Classification Performance for Testing Data 



Measure 


Computation 


Value 


Definition 


Sensitivity 


27/(27+13) 


0.675 


The proportion of correctly classified events (expert). 


Specificity 


144 / (6+144) 


0.960 


The proportion of correctly classified nonevents (novice). 


False positive 


6 / (6+27) 


0.182 


The proportion of observations misclassified as expert 
over all of those classified as experts. 


False negative 


13/(13+144) 


0.083 


The proportion of observations misclassified as novices 
over all of those classified as novice. 



As shown in Table 9, the predietions for experts were less aeeurate than noviee 
elassifieation. This observation is supported by the magnitude of sensitivity (67.5%) 
eompared to that of speeifieity (96.0%). Both false positive and false negative rates were 
modest at 18.2% and 8.3% respeetively. Given the distribution of expert and noviee aeross 
the two data fdes, the default aeeuraey in elassifieation by identifying all eases as noviee (the 
most prominent elassifieation) in the training data was 75% and 78.9% in the testing data. 
Compared with the overall pereent eorreet elassifieation in the training data (96.5%) and the 
testing data (90.0%), there was a 21.5% and 1 1.5% improvement, respeetively. 

Discussion 

In this study, our objeetive was to test whether sensor-based skill measures provided 
diseriminatory power in differentiating noviee and expert skill performanee. A key finding in 
our analysis is that sensor-based skill measures, eonsidered jointly, provide a reliable method 
of diseriminating differenees in expert-noviee marksmanship performanee. Speeifieally, 
breath loeation, breath duration, and trigger duration prove to be signifieant predietors in 
expert-noviee shot elassifieation. 
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In evaluating the predicted probabilities, the training data exhibited an accuracy rate of 
96.5% and the testing data only 90.0%. One possible explanation for this discrepancy is the 
variability in skill performance across experts. Whereas the 6 false positive cases in the 
testing data are distributed nearly evenly across five novice shooters, all 13 false negative 
cases are distributed across only two experts; one expert shooter accounted for 9 of the false 
negative classifications, with the remaining 4 attributed to another. Given that in the testing 
data, all 4 false negative cases came from a single expert shooter, there is reason to believe 
that, even across experts, there is a considerable amount of variability in skill performance. 
Since the criteria for expert selection were based in part on active-duty marksmanship 
coaches, further refinement of the expert group into subgroups may lead to improved 
predictions and shed light on additional levels of skill performance. 

Although we are confident in the results of sensor-based measures in differentiating 
skill performance, we remain cautious in extending the generalizability of these results to 
live-fire environments. Additional studies are needed to assess the reliability of sensor-based 
assessment of skill performance in live-fire environments in supporting skill diagnosis. 
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